{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Location Data Analysis Using Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Step 1. Install anacondas. This will give you all the packages you need to do data science. Here is a link http://bit.ly/1NlvHdW\n",
    "\n",
    "Step 2. Install iPython. You don't have to do this, you could use idle or PyCharm or PyDev, but iPython is the best way to write Python code hands down. It's also pretty easy to install (assuming you don't go on mass file deleting sprees like I do..). Here is the link http://bit.ly/1SRw7MP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can start writing code. Import pandas, and create a dataframe using the 'read_csv' function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>zip_code</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>city</th>\n",
       "      <th>state</th>\n",
       "      <th>county</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>601</td>\n",
       "      <td>18.165273</td>\n",
       "      <td>-66.722583</td>\n",
       "      <td>Adjuntas</td>\n",
       "      <td>PR</td>\n",
       "      <td>Adjuntas</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>602</td>\n",
       "      <td>18.393103</td>\n",
       "      <td>-67.180953</td>\n",
       "      <td>Aguada</td>\n",
       "      <td>PR</td>\n",
       "      <td>Aguada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>605</td>\n",
       "      <td>18.465162</td>\n",
       "      <td>-67.141486</td>\n",
       "      <td>Aguadilla</td>\n",
       "      <td>PR</td>\n",
       "      <td>Aguadilla</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>606</td>\n",
       "      <td>18.172947</td>\n",
       "      <td>-66.944111</td>\n",
       "      <td>Maricao</td>\n",
       "      <td>PR</td>\n",
       "      <td>Maricao</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>610</td>\n",
       "      <td>18.288685</td>\n",
       "      <td>-67.139696</td>\n",
       "      <td>Anasco</td>\n",
       "      <td>PR</td>\n",
       "      <td>Anasco</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   zip_code   latitude  longitude       city state     county\n",
       "0       601  18.165273 -66.722583   Adjuntas    PR   Adjuntas\n",
       "1       602  18.393103 -67.180953     Aguada    PR     Aguada\n",
       "2       605  18.465162 -67.141486  Aguadilla    PR  Aguadilla\n",
       "3       606  18.172947 -66.944111    Maricao    PR    Maricao\n",
       "4       610  18.288685 -67.139696     Anasco    PR     Anasco"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "cities = pd.read_csv(\"/Users/alexwoods/Desktop/PeurtoRico.csv\") # data - cities of peurto rico\n",
    "cities.head()       # shows all columns, and first 5 rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "We can use the head() function as shown above to get a feel for the dataset. We can also use the count() function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "zip_code     89\n",
       "latitude     89\n",
       "longitude    89\n",
       "city         89\n",
       "state        89\n",
       "county       89\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cities.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below you can see how we index the column and row. This is really useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Adjuntas'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cities['city']        # to access the whole 'city' column\n",
    "cities['city'].ix[0]  # to access just the first row of the city column - Adjuntas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is something I like to do if I'm planning on running more computer science like algorithms on the data (perhaps a greedy algorithm, or something else that the dataset lends itself to). What I'm talking about is make an object for each row (only if appropriate!).\n",
    "\n",
    "So I create a standard python class, and pass in row number to the constructor, because that's how I'm going to create an array of these objects. I'm using 'getters' instead of accessing the data members directly, just because that's a good practice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class City():\n",
    "    def __init__(self, rowNum):\n",
    "        self.name = cities['city'].ix[rowNum]             # the most important attribute!\n",
    "        self.zipCode = cities['zip_code'].ix[rowNum]\n",
    "        self.latitude = cities['latitude'].ix[rowNum]\n",
    "        self.longitude = cities['longitude'].ix[rowNum]\n",
    "        self.county = cities['county'].ix[rowNum]\n",
    "        \n",
    "    def getName(self):\n",
    "        return self.name\n",
    "    \n",
    "    def getZipCode(self):\n",
    "        return self.zipCode\n",
    "    \n",
    "    def getLat(self):\n",
    "        return self.latitude\n",
    "    \n",
    "    def getLong(self):\n",
    "        return self.longitude\n",
    "    \n",
    "    def getCounty(self):\n",
    "        return self.county   \n",
    "     \n",
    "    # we should always have a string representation of the object    \n",
    "    def show(self):                                     \n",
    "        string = \"City = \" + self.getName() + \"\\n\" + \"Latitude = \" + str(self.getLat()) + \"\\n\" + \"Longitude = \" + str(self.getLong()) + \"\\n\"\n",
    "        print(string)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The show() function prints a nice string representation of the city object. Below I'm going to create an array of and fill it with the whole dataset, so it will be easier to run an algorithm through it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "City = Adjuntas\n",
      "Latitude = 18.165273\n",
      "Longitude = -66.722583\n",
      "\n",
      "City = Aguada\n",
      "Latitude = 18.393103\n",
      "Longitude = -67.180953\n",
      "\n",
      "City = Aguadilla\n",
      "Latitude = 18.465162\n",
      "Longitude = -67.141486\n",
      "\n",
      "City = Maricao\n",
      "Latitude = 18.172947\n",
      "Longitude = -66.944111\n",
      "\n",
      "City = Anasco\n",
      "Latitude = 18.288685\n",
      "Longitude = -67.139696\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# now I'm going to make an array of cities. The point of all this is to make running \n",
    "# algorithms on the dataset easier on myself.\n",
    "\n",
    "places = []                       # already used the name 'cities'...\n",
    "\n",
    "for i in range(cities['city'].count()):       # the method inside will return row \n",
    "     temp = City(i)                           # length for the 'city' column.\n",
    "     places.append(temp)\n",
    "    \n",
    "\n",
    "for j in range(5):           # the data in it's new array of objects format!\n",
    "    places[j].show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we're going to get a little more complicated. I want to calculate the distance in between two cities (nodes) and then write a function that finds the closest city for any given city!\n",
    "\n",
    "note - The Haversine formula below, you can ignore that. It's the mathmatical way to calculate distance between two points of longitude and latitude. It's one of those things you google when you need it then never use or remember it again. It is, however, critical to our distance function.\n",
    "\n",
    "Notice that when I write the function to find the closest city, I'm extremely careful to make sure that it doesn't ever compare to itself. This is because if it did, it would pick itself every time, making the algorithm useless. This is important to take into account if you are writing any route planning algorithms, like the one I want you to try in the challenge part."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# now some other methods that might be useful for analysis on these cities!\n",
    "\n",
    "# the haversine formula is a way to calculate distance between a longitude and latitude.\n",
    "# this code is via - http://bit.ly/1bKauqS\n",
    "# don't look to into it unless you love geography...\n",
    "from math import radians, cos, sin, asin, sqrt\n",
    "\n",
    "def haversine(lon1, lat1, lon2, lat2):\n",
    "    \"\"\"\n",
    "    Calculate the great circle distance between two points \n",
    "    on the earth (specified in decimal degrees)\n",
    "    \"\"\"\n",
    "    # convert decimal degrees to radians \n",
    "    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])\n",
    "\n",
    "    # haversine formula \n",
    "    dlon = lon2 - lon1 \n",
    "    dlat = lat2 - lat1 \n",
    "    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2\n",
    "    c = 2 * asin(sqrt(a)) \n",
    "    r = 6371 # Radius of earth in kilometers. Use 3956 for miles\n",
    "    return c * r\n",
    "\n",
    "\n",
    "# a distance function to make my life easier\n",
    "def distance(a, b):                  # 'a', and 'b' will just be City objects!\n",
    "    return haversine(a.getLong(), a.getLat(), b.getLong(), b.getLat())\n",
    "\n",
    "\n",
    "# what if I want to know the closest city?\n",
    "import random\n",
    "\n",
    "def findClosestCity(a):\n",
    "    start = places[random.randrange(0, 89)]        \n",
    "    while start == a:                         # if I don't make sure it can't be itself, \n",
    "        start = places[random.randrange(0, 89)]      # it will pick itself every time.\n",
    "\n",
    "    champDistance = distance(a, start)        # the distance we will \"challenge\"\n",
    "    closest = start\n",
    "    \n",
    "    for i in places:\n",
    "        testDistance = distance(a, i)\n",
    "        if testDistance < champDistance and not a == i:\n",
    "            closest = i\n",
    "            champDistance = testDistance      # now it will be the thing to challenge.\n",
    "    \n",
    "    return closest\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "City = Mayaguez\n",
      "Latitude = 18.219023\n",
      "Longitude = -67.508068\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# now let's test some of the functionality of what we just coded!!!\n",
    "# let's find a location to start at. 35 is a randomish number..\n",
    "places[35].show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "City = Rincon\n",
      "Latitude = 18.335781\n",
      "Longitude = -67.252547\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# ok, Mayaguez it is!\n",
    "Mayaguez = places[35]\n",
    "closeToMaya = findClosestCity(Mayaguez)\n",
    "\n",
    "closeToMaya.show()               # here's google directions to check - http://bit.ly/1BTTyks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "City = Rincon\n",
      "Latitude = 18.335781\n",
      "Longitude = -67.252547\n",
      "\n"
     ]
    }
   ],
   "source": [
    "places[33].show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Challenge - write a greedy algorithm to try and find the minimum travel time to ten locations. Start in Mayaguez. I'll post the answer soon enough."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
